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Abstract: Time series foreeasting is an important predictive methodology whieh ean be 
applied to a wide range of problems. Partieularly, forecasting the indoor temperature permits 
an improved utilization of the HVAC (Heating, Ventilating and Air Conditioning) systems 
in a home and thus a better energy effieiency. With such purpose the paper deseribes how 
to implement an Artifieial Neural Network (ANN) algorithm in a low eost system-on-chip 
to develop an autonomous intelligent wireless sensor network. The present paper uses a 
Wireless Sensor Networks (WSN) to monitor and foreeast the indoor temperature in a smart 
home, based on low resources and cost microcontroller teehnology as the 8051MCU. An 
on-line learning approach, based on Back-Propagation (BP) algorithm for ANNs, has been 
developed for real-time time series learning. It performs the model training with every new 
data that arrive to the system, without saving enormous quantities of data to create a historical 
database as usual, i.e., without previous knowledge. Consequently to validate the approaeh 
a simulation study through a Bayesian baseline model have been tested in order to compare 
with a database of a real application aiming to see the performanee and accuracy. The eore 
of the paper is a new algorithm, based on the BP one, which has been deseribed in detail, 
and the ehallenge was how to implement a computational demanding algorithm in a simple 
architecture with very few hardware resources. 

Keywords: wireless sensor networks; artificial neural networks; on-line Back-Propagation; 
ambient intelligenee; energy efficieney 
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1. Introduction 

Wireless Sensor Networks (WSNs) have been widely eonsidered as one of the most promising present 
and future teehnologies. In faet, the latest advanees in wireless eommunieation teehnologies have made 
it possible to develop tiny, eheap and smart sensors embedded in a small physieal area, with wireless 
network eapabilities, that provides huge opportunities for a vast variety of applications. Some common 
examples can be found, such as industrial monitoring processes, machine health monitoring, physical 
and environmental conditions monitoring, etc. [1]. However, one of its most promising applications is on 
smart homes and ambient intelligence, which makes it feasible to provide scalable intelligent networks 
of sensors/actuators according to new home technologies appear on the market. WSNs can be used to 
provide more convenient and intelligent living environments for human beings and can be embedded into 
a house to develop an autonomous home network. The present paper uses a WSN to monitor and forecast 
the indoor temperature in a smart home, based on low resources and low cost microcontroller technology. 

Several studies say that in the European Union about 40% of total primary energy demand corresponds 
to buildings’ consumption [2]. At home, more than a half of such consumption is produced by HVAC 
(Heating, Ventilating and Air Conditioning) systems [3]. The indoor temperature is the most crucial 
variable that determines the utilization of such systems and thus has a major effect on the overall energy 
expenditure. For that reason, it is still necessary to develop new intelligent systems at home to manage 
the demand of energy efficiently, considering a plausible balance between consumption and comfort. To 
develop such intelligent systems, artificial intelligence techniques, as forecasting, can be applied. Soft 
computing has been widely used in real-life applications [4,5]. Furthermore, the estimation of Artificial 
Neural Network (ANN) models by using machine learning techniques have been applied for a wide 
range of applications, and are also devoted to developing energy systems [2,6-8]. The problem is that 
such techniques normally require high computational resources and historical data, and the traditional 
training method is based on batch learning, as for example Back-Propagation (BP) algorithm and its 
variants. But for most applications, it could consume from several minutes to some hours and further the 
learning parameters must be properly chosen to ensure the convergence (i.e., learning rate, number of 
learning epochs, stopping criteria, etc.). In a batch learning system, when new data are received then it is 
performed a retraining jointly with its past data, thus consuming a lot of time as it is mentioned in [7,9]. 

Nevertheless, as an alternative, an on-line learning approach could perform the model training with all 
new incoming data. In fact, when it is necessary to learn a model from scratch or to adapt a pre-trained 
one in a totally unknown scenario, on-line learning algorithms can be applied successfully [10]. Thus, 
we talk with regard to Stochastic Gradient Descent Back-Propagation (SGBP) as a particular variant of 
BP for sequential or on-line learning applications. Through sequential or on-line learning methods, the 
training observations are sequentially presented to the learning algorithm. Therefore, when new data 
arrive at any time, they are observed and learned by the system. In addition, as soon as the learning 
procedure is completed the observations are discarded, without having the necessity to store too much 
historical information, that also implies less necessity of additional physical storage. In conclusion, the 
learning algorithm has no prior knowledge about how many training observations will be presented, 
although it is possible to produce a better generalization performance at a very fast learning speed [9] 


Sensors 2015, 15 


3 


and needs less eomputing resourees that aeeomplish with our idea of integrating this teehnology in a low 
eost embedded system inside a WSN framework. 

The present researeh group is eoneemed with regarding the idea of being able to design new 
intelligent systems, with few hardware resourees, to prediet values of strategie variables related to energy 
eonsumption, i.e ., low eost and small predietive systems. For that purpose, sequential learning algorithms 
have demonstrated their feasibility to aehieve sueh objeetives. This also implies having eheap hardware 
deviees embedding eomplex artifieial intelligenee teehniques for foreeasting in unknown environments, 
but also with affordable eomputing and eeonomieal eosts. As far as we know, it is usual to employ 
WSN as the monitoring system that feeds an ANN implemented in a personal eomputer, as an ANN 
requires some eomplex ealeulations and that also means using wide data storage. However, what it is 
proposed in this paper is whether or not it is feasible to implement, inside a node of a WSN, an ANN 
that performs predietions with an aeeeptable resolution in its estimations. Consequently, in this paper, 
we present a preliminary model able to generate low error predietions over short periods of learning 
time. Regarding the innovation of the paper, it has been developed a new on-line learning algorithm, 
based on a BP framework, whieh is able to preproeess real-time eontinuous input data, ineoming in a 
non-deterministie way, from a wireless environment, being also feasible to be implemented in deviees 
with very low hardware resourees, i.e., with important hardware eonstraints. 

The paper is organized as follows, in Seetion 2, we deseribe the framework in whieh the present 
work have been developed, i.e., a wireless sensor network, deseribing the hardware of the different 
nodes and the network topology in order to slightly deseribe the experimental setup. In Seetion 3, we 
explain the approaeh that has been followed to foreeast time series using an on-line learning paradigm 
based on Baek-Propagation (BP) algorithm for Artifieial Neural Networks. Seetion 4 depiets in detail 
the algorithm developed to be implemented in a low resourees mieroeontroller as the 805IMCU. Finally, 
Seetions 5 and 6 the experimental results and the diseussion and eonelusions explain the present researeh 
and draw some future ideas to eontinue the present projeet. 

2. Wireless Sensor Network Architecture 

Basically, a WSN consists of a large number of low-cost, low-power and multifunctional sensor nodes 
that are deployed in an environment devoted for monitoring tasks, but also for controlling as current 
networks are bidirectional, where sensor activity can be controlled. Actually, such sensor nodes, small 
in size, are equipped with sensors, embedded microprocessors and radio transceivers, and therefore 
they have also capabilities for data processing and communication over short distances, via a wireless 
medium, in order to collaborate to accomplish a common task [1]. 

A WSN is built of nodes which can vary in number from a few to several hundred or even thousands, 
in which each node is connected to one or several sensors. Each sensor network node is typically divided 
into several parts: a radio transceiver with an internal antenna or connection to an external one, a 
microcontroller, an electronic circuit for interfacing with the sensors and an energy source, usually a 
battery or an embedded device for energy harvesting [11]. Furthermore, the cost of sensor nodes is also 
variable, ranging from a few to hundreds of Euros, depending on the complexity of the individual sensor 
nodes. Additionally, size and cost constraints on sensor nodes also result in the corresponding constraints 
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on resources such as energy, memory, computational speed and communications bandwidth. Finally, 
about the topology of the WSNs, this can vary from a simple star network to an advanced multi-hop 
wireless mesh network. In addition, the propagation technique between the hops of the network can be 
routing or flooding [12,13]. 

Besides sensor networks have the following unique characteristics and constraints, as it is stated in [1]: 

• Dense Node Deployment. The number of sensor nodes can be of several orders of magnitude. 

• Battery-Powered Sensor Nodes. Being in some situations difficult or even impossible to change or 
recharge their batteries. 

• Severe Energy, Computation, and Storage Constraints. Sensor nodes are resource limited. This 
work is focused on this constraint. 

• Self-Configurable. Sensor nodes configure themselves into a communication network. 

• Application Specific. A network is usually designed and deployed for a specific application. 

• Unreliable Sensor Nodes. They are prone to physical damages or failures. 

• Frequent Topology Change. Network topology changes due to node failure, damage, addition, 
energy depletion, or channel fading. 

• No Global Identification. It is usually not possible to build a global addressing scheme for a sensor 
network because it would introduce a high overhead for the identification maintenance. 

• Many-to-One Traffic Pattern. In most sensor network applications, the data sensed by sensor nodes 
flow from multiple source sensor nodes to a particular sink. 

• Data Redundancy. The data sensed typically have a certain level of correlation or redundancy. 

Furthermore, the characteristics of sensor networks and requirements of different applications have 
a decisive impact on the network design objectives in terms of network capabilities and network 
performance. Thus, typically influential design objectives for sensor networks include the following 
several aspects: small node size, low node cost, low power consumption, self-configurability, scalability, 
adaptability, reliability, fault tolerance, security, channel utilization and QoS support [1]. 

Moreover, a typical scheme of a wireless sensor network is composed of a set of nodes that transmits 
the information acquired to a sink node. This one is usually devoted to collecting and centralizing all the 
information that comes from the network to a Personal Computer (PC), in order to store big quantities of 
data in a persistent device. Thus, the information collected can be treated for on-line or later analysis. But 
for the purposes of our study, we don’t want to dump the information acquired by the network to a PC. 
It is desired to use such information, as it will be described later, to train a neural network, implemented 
inside a sink node, trying to develop an autonomous forecasting system. 

Figure 1 shows the wireless sensor network scheme designed in the present study. The network is 
composed of five nodes, but more nodes can be added in the way the figure displays. There is a sink 
node connected to a PC for configuration and validation purposes and four sensor nodes that capture the 
temperature inside a room. Sensor nodes can work as repeaters allowing low power transmit modes in 
order to extend battery life. 
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2.1. Nodes Description 

As mentioned previously, our wireless sensor network eonsists of two kinds of nodes, four sensor 
nodes and one sink node. Both are based on the same teehnology, although, of eourse, in eharge of 
different tasks. All nodes are based on the CC1110F32 mieroeontroller (Texas Instruments, Dallas, TX, 
USA) [14]. The CC1110F32, is a true low-power sub-1 GHz system-on-ehip (SoC) designed for low 
power wireless applieations. It combines the excellent performance of the state-of-the-art RF transceiver 
CCl 101 with an industry-standard enhanced 8051 MCU, up to 32 KB of in-system programmable flash 
memory and up to 4 KB of Random Access Memory (RAM), and many other powerful features. The 
radio frequency range can be chosen from: 300-348 MHz, 391-464 MHz and 782-928 MHz. Its small 
6x6 mm package makes it very suited for applications with size limitations. The CCl 110F32 is highly 
suited for systems where very low power consumption is required. This is ensured by several advanced 
low-power operating modes. Additionally, its low power demands (16 mA for transmission at 10 mW 
and 18 mA for reception) make it suitable for battery-powered systems. 

Sink node (Figure 2): Receives all the wireless information transmitted by the sensor nodes. It is 
switchable to a PC through a USB connection for configuration purposes, and it works at a speed of 
868 Mhz. The USB is also its power connection, its dimensions are 40 x 40 x 90 mm and it is able to 
work within a temperature range of -40 to 85°C. 2-FSK, GFSK, MSK, ASK, and OOK modulation 
formats are supported and includes a 128-bit AES security coprocessor. Its power transmission is 
10 mW and high sensitivity of -110 dBm at 1.2 kBaud. It has an external exchangeable antenna and 
programmable data rate up to 500 kBaud. The processor also includes one timer of 16 bits and three 
timers of 8 bits and on-chip hardware debugging. 
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Figure 2. Sink node. 


Sensor node (Figure 3): There are four sensor nodes, distributed in a room for the present study, 
that eonstantly send the temperature they piek up from the ambient. All four have been ealibrated 
with a digital thermometer at the same point and later have been distributed with similar distanees to 
the sink node. The sensor node is based on the same SoC as the sink node and additionally it has 
a temperature sensor, three eolor leds, two buttons (one reset and another for general purpose) and 
an input/output expansion eonneetor to install other sensor/aetuators. Its power supply ean be through 
batteries, USB or eleetrie power. In addition, its antenna, whieh is integrated in the eireuit, ean reaeh 
up to 290 m without repeaters. It is possible to eonneet more sensor nodes, as mueh as are needed, 
depending on the applieation target. They autonomously eonneet themselves to the network and start to 
send temperatures immediately. 


CC1110F32 microcontroller 



Figure 3. Sensor node. 


The hardware of the sensor and sink nodes have been developed by the eompany Wireless Sensor 
Networks Valeneia SL for applieations of monitoring and eontrolling of industrial systems, domoties. 
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security and alarm systems, sensorization, telemetry, etc. In this study, such nodes have been 
programmed to deploy the wireless sensor network and to implement an ANN for on-line learning 
purposes. 

There are other research works developed with more advanced architectures, as with ARM 
microcontrollers. Nevertheless, it is true that the Texas Instruments CCI110F32 MCU is highly accepted 
in a lot of areas, and a leader in the market as far as we know, for Industrial monitoring and control 
applications. Moreover, it is also a true low power SoC, excellent for WSNs. If the application is to 
massively deploy such devices, they are cost competitive and really small. These devices accomplish 
with the simplest architecture that makes feasible the implementation of our algorithm. The SoC has 
been signaled in the pictures to see the size. 

3. Time Series and Forecasting 

In smart homes is common to measure and monitor environmental variables related to comfort 
and energy consumption. This kind of data are recorded over a period of time (every second, every 
minute...). Studying the behavior of these variables and forecasting their values in the future allows to 
manage energy resources more efficiently. 

A collection of data recorded repeatedly over the time represents a time series from the point of view 
of an statistical treatment of data. Thus, a time series is a collection of data recorded over a period of 
time from any interesting process. They can be formalized as a sequence of scalars from a variable x 
obtained as output of the observed process: 


Xti 1 Xti , ,... ( 1 ) 

There are several methods that are appropriate to model a time series. Time series data have an internal 
structure, such as auto-correlation, trend or seasonal variation and these features should be considered 
by the modeling and forecasting methods that can be used. 

Several approaches has been widely employed, such as smoothing and curve fitting techniques and 
autoregressive and moving average models [15-17]. However, the idea of restricting the storage and 
computational structure, of our present study, don’t allow us using them. Other methods framed in 
regression models have been implemented to carry out the main purpose of this paper. 

In order to simplify as much as possible the forecasting process, in this work, two different low 
resource implementable models have been proposed: a linear model (that is, a Perceptron) and a 
Multilayer Perceptron (MLP) model with one hidden layer. Both methods have been implemented and 
introduced into the sink node. 

Furthermore, with the purpose of comparison and in order to know the behavior of these two 
algorithms, a baseline standard method has been developed in a PC. A linear model (a perceptron) 
has been selected and estimated by a standard method: Bayesian estimation. A Baseline model approach 
will represent the results of a standard model, estimated with more storage prerequisites and with more 
complex computational necessities. It is expected that the results of this baseline model will be better 
than those provided by the constrained models. However, this comparison will allow to assess, based 
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on the magnitude of errors obtained, the feasibility of using the algorithms implemented in the deviee 
whieh has been proposed in this paper. 

3.1. Measurements of Average Model-Performance 

Measurements of average error or model performanee are based on statistieal summaries of the 
differenees between eaeh target value veetor y[t] and its predieted veetor y[t]. In general, the most used 
measure of average model-performanee is MAE (Mean Absolute Error): 

1 ^ 

MAEif = i) = - ^\y^[t] - [t] I (2) 

^ Z=1 

The MAE averaged over several time instants in a data set T> will be denoted as MAE*: 

^ ^ MAEii) 

MAE* = Y, |p| (3) 

being \V\ the number of samples in V. 

3.2. On-Line Learning in Time Series Forecasting 

An on-line learning approaeh, as the proposed in this work, is a elass of sequential learning 
paradigm where data frames ineome on real-time. The present work deals with data frames that are 
not equidistant in time. This enables the WSN to be modified at any time, and it allows that the 
failure of any node at any moment was not a eritieal issue. In Seetion 4 we explain how the algorithm 
preproeesses the raw ineoming data frames to eompute equidistant and first-order differentiated time 
series values. Eurthermore, the proposal of having a low hardware resourees eonstraint for the algorithm 
implementation, only lets us to store a short buffer of ineoming data, foreing the time series foreeasting 
model to update its parameters with every new value that arrives. 

As far as we know, the eonvergenee and behavior of on-line algorithms has been studied, by 
maehine learning seientists, as well as the well known Stoehastie Gradient Deseent and its variants, and 
partieularly the error Baek-Propagation (BP) algorithm for Artifieial Neural Networks [18]. The on-line 
BP is stoehastie when the training samples are learned in a random order, probably with replaeement. 
This randomized update proeess yields a noisy gradient eomputation problem, whieh is usually overeome 
by a higher number of model updates, eompared with the bateh version of the same BP algorithm. 
Therefore, in the standard framework for on-line BP, random traversal of the dataset yields better results. 

However, this work is foeused on a straightforward algorithm for real-time time series learning, that 
follows an on-line BP strategy but in a sequential way. Sueh a problem statement has the advantage of 
its possible implementation for low resourees deviees, but it laeks of the random data traversal of usual 
on-line BP approaehes, whieh eould be a souree of problems in the eonvergenee of the algorithm. As 
expeeted, this problem eould be more shoeking as more eomplex the model is. 

A possible refitting, out of the purpose of this work, eould be the implementation of a random 
skip proeedure, that allows for ignoring ineoming values depending on a probability parameter. Sueh 
a skip proeedure introduees a trade-off between the stoehastie behavior in gradient eomputation and 
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the number of samples used to train the model. Additionally, other interesting extensions are possible 
for more powerful deviees, as a short buffer of input/output samples, seleeted randomly from the data 
stream (known as reservoir sampling), that performs a model update every k random steps. Even more 
eomplieated algorithms are also possible, where the skip deeision rule depends on the behavior of the 
model with eurrent data samples. 

Different variations of on-line BP algorithm ean be taken into eonsideration for real-time fitting. It is 
true that Least Mean Squares training have problems when input variables are in different seales. Thus, 
Normalized Least Mean Squares [19] training is proposed to taekle with sueh seale problems. However 
the developed system adopt first-order differenees of input data and all input variables belong to the 
same signal, that is, all input variables have the same seale. Other issue of the on-line training algorithm, 
when exists eapture error, is the not eonvergenee to a unique value, but to a minimum eapture error. This 
question eould be solved by ineorporating dead-zones in the algorithm as stated in [19]. 

3.3. Baseline Method: Standard Bayesian Linear Model Estimation 

In linear regression models, observations eonsist of a response variable in a vector y and one or more 
predictor variables in a matrix X. The response vector has n elements, concerning to n observations, so 
the matrix X will have n rows and p columns corresponding to the number of predictors or covariates 
(n > p -f 1 for non-denerate variance parameter estimation). Additionally, it is usual to introduce an 
intercept parameter in the model, thus a column of X is a column completed by number ones. In the 
same way, other exogenous variables, which could condition the forecast of the response variable, 
could be introduced as additional columns in matrix X. Consequently, the parameters are the regression 
coefficients, w, and the error variance of the fitted model, e^. Thus, the linear model could be written in 
its matricial form: 

y|w,e^X~iV(Xw,eX) (4) 

where represents the n x n identity matrix. 

Occasionally, more than one future value must be predicted, hence the model receives at each moment 
as input the p last values in time series and must predict next q values, thus y and w becomes in a n x g 
and pxq matrices Y and W. In this framework, at time t it is available one new element of p past values 
in the time series {[xt-p+i ,..., Xt-i,Xt\) that was considered as covariates in the simple linear model to 
forecast q further values {xt+i, ..., Xt+q) = (r/i,...., Pg). In consequence, it has been built a simple linear 
model for each prediction: 


Y,|W.„e^X~iV(XW.„e2j^ i = l,...,q (5) 

where M j denotes f-column at any M matrix. 

This forecasting process needs n + I observations to start to generate predictions (X represents a 
(n -f 1) X p matrix and Y a (n -f 1) x g matrix) and g linear models must be estimated (one for each 
prediction) in each time step. 

Some assumptions must be considered about the classical linear model, such as X should be full rank 
(no collinearity among predictors), exogenous predictors and not auto-correlated errors. Using this sort of 
models to represent time series, that uses lagged predictors to incorporate feedback over the time, means 
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that some of these assumptions are violated. Autoregressive proeesses eomparable to this one introduee 
violations of elassical linear model assumptions that lead to biased parameters estimations. Nevertheless, 
it eould be improved, eonsidering more eomplex eovarianee struetures. As it has been mentioned before, 
a random skip proeedure eould solve some of these problems. However, in this ease, the primary goal 
of our work has been foeused on building an on-line estimation model proeess. Moreover, it has to be 
able to perform predietions with an aeeeptable resolution in its estimation, and with low eomputing and 
storage resourees. 

A Bayesian framework [20] provide a natural way to perform an on-line estimation. Such methods 
make it possible to incorporate scientific hypothesis or prior information based on previous data, by 
means of the prior distribution. Nevertheless, in the absence of prior information, a Bayesian estimation 
of the parameters is made in an objective context, with objective prior distributions for them, that are 
estimated with information provided by the data. In addition, when prior information is available or 
some scientific hypothesis have been assumed, a Bayesian estimation could be made in a subjective 
framework, incorporating prior information into parameter prior distributions. 

In the context of this work, the first step is a Bayesian parameter estimation, made by means of a 
non-informative prior distribution for the parameters. When the first parameter estimation is available 
then the estimated model is used to generate the predictions. The predictive distribution, Y, given a new 
set of predictors Xp has mean: 

Y.j = XpW[0].i z = 1, ...,g (6) 

where W[0] is calculated with first n + 1 data elements available X and Y as: 

W[0], = (XTX)-iXTY, (7) 


Therefore, it demonstrates that it is necessary to solve matrix products and inverse matrices with 
dimension {p x p). Inversion matrices could be avoided employing, for example, QR decomposition in 
X matrix, but in terms of computational and storage cost remains as an expensive process. 

After this first step, the system has prior information available, that must be incorporated in a future 
parameters estimation. The way to introduce previous information, with new data to improve such 
parameters estimation, is by using informative prior distribution. In the context of this work, linear 
regression with an informative prior distribution was used. At time t, previous estimation on parameter 
assessment was employed in prior distribution of parameters and new data was utilized to re-estimate 
model parameters. Furthermore, if at time t the last parameter estimation is W[f — 1] and we have 
new data X[t] and Y[t], the new appraisal at this point of time can be computed by treating the prior 
as additional data points, and then weighting their contribution to the new estimation [20]. To perform 
the computations, for each prediction value Y[t].j (z = 1,..., g) it is necessary to construct a new 
vector of observations Y* with new data and last parameters assessment, and predictor matrix X*, and 
weight matrix S based on previous variance parameters estimation as follows (more details can be read 
at the Appendix): 


nth 


( 8 ) 


( 9 ) 






Sensors 2015 , 15 


11 


where Ip represents ap x p identity matrix. The new parameters appraisal at time t eould be written as: 

W[t],i = (X*TS-^X*)-^X*TS-^Y* i = (10) 

Thus, the predietive distribution, Y, given a new set of predietors Xp has as a mean: 

Y, = XpW[t]., t = (11) 

Complexity in seeond and posterior steps in the assessment proeess is higher than the first step. The 
number of matrix inversions and eomputational eomplexity has inereased now. 

Consequently, assuming a simple linear model, with the limitations mentioned before that this type of 
models have to model dynamic processes, it represents a computational and storage costs too high to be 
implemented in a device with low hardware resources, as we are concerned in this paper. The Bayesian 
standard linear model parameters estimation is a simple process but with high computational resource 
requirements. Thus, it is necessary to solve some inverses matrices, whose cost is too expensive in the 
context of this paper. 

In conclusion, this model has been considered as baseline model to compare the results of both 
algorithms related with the two ANN models that have been implemented in the sink node. 

4. Sequential On-Line Back-Propagation Algorithm for Devices with Low Resources 

This section describes the considered implementation of an on-line version of BP algorithm [18] 
for devices with very few memory and computing resources, as the 8051 microcontroller included in 
the sink node. The BP algorithm is able to train any kind of ANN. As it was stated at Section 3.2, the 
present research proposes the utilization of a sequential version of on-line BP, and compares two different 
models with previous stated baseline: a linear model (that is, a perceptron) as shown in Equation (12) 
and equivalent to the model of Equation (5), and a MEP with one hidden layer as shown in Equation (13): 

y = Wi-x-hbi (12) 

y = W 2 • s(Wi ■ x-f bi)-f ba (13) 

being weight matrices, hj bias vectors, x the input and y the output of the ANN. The Appendix B 
describes the derivation of BP algorithm to train any kind of ANN, from perceptrons to MEPs with 
any number of hidden layers. Moreover, the input vector x can also be extended with other exogenous 
variables that could influence the output response variable. BP is a kind of first order gradient descent 
algorithm, therefore it needs the computation of partial derivatives over and b^ . BP algorithm has 
been chosen because of its simplicity, it depends in algebra operations as dot products, matrix-vector 
products and component-wise products. The proposed MEP has the logistic activation function in the 
hidden layer and the activation function can be implemented by means of an exponential one. Such 
operations are available by hardware and/or software libraries in 8051 microcontroller, and the memory 
requirements to implement these operations depend on the complexity of the ANN developed: 

• In the case of the perceptron, it needs a weights matrix Wi of size p x q, a bias vector with 
q elements, input and output vectors with size p and q, and an output gradients vector with q 
elements. The BP algorithm would need p x q + p + 3q real numbers in memory to work with the 
perceptron, lets consider p = q = 8. Thus, it would need 96 real numbers. 
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• In case of MLP with one hidden layer of length h, it needs two weights matriees Wi and W 2 . The 
first one with size p x h, the seeond one with h x q; two bias veetor, the first one with h and the 
seeond one with q elements; the input, hidden and output veetors, with size p, h and q respeetively; 
and the output and hidden gradients veetor with q and h elements also eorrespondingly. The BP 
algorithm needs pxh + hxq + 2p + ?)h + 2q real numbers, lets eonsider p = h = q = 8. Thus, 
it would need 184 real numbers in memory. 

Using 32 bit preeision for real numbers, the memory resourees of BP algorithm for one hidden layer 
MLP when p = h = q = 8 are 184 ■ 4 = 736 bytes. 

A BP algorithm has been introdueed into the souree eode that has been implemented for the sink node. 
This algorithm has been split in three parts and basieally eomputes differenees in mean temperature 
every quarter (15 min), and handles these data to learn the ANN model. The memory requirements of 
this algorithm and its eharaeteristies in detail will be deseribed in the next seetions. For the sensor nodes, 
the souree eode is not deseribed as they only send data to the sink node, the temperature. 

4.1. Main Loop Algorithm 

The experimental setup eonsisted of the equidistant plaeement of four sensor nodes in the living room 
of a solar house. They send eontinuously the temperature to a sink node that is in eharge of predieting the 
mean temperature for the next hours. The sink node is eonneeted to a PC, mainly for some eonfigurations 
and power reasons, and it was plaeed at a small work plaee in the living room. The sink node sends the 
temperature predietions to the home’s eentral eontrol. It is an applieation that was developed by the 
present group to monitor and eontrol all the energy systems for the purpose of the Solar Deeathlon 
eompetition [21]. Its name is CAES (Computer-Aided Energy Saving System) system [8]. 

The main loop proeedure implemented in the sink deviee is shown in Algorithm 1. Before the main 
loop, the sink node starts with some initializations, as the board support paekage, the minimal radio 
frequeney (RE) interfaee and the rest of issues eorresponding to the wireless network. To do that, it 
has been eompiled the SimplieiTI protoeol from Texas Instruments, that is a simple low-power RE 
network protoeol aimed at small RE networks. All of those aspeets eorrespond to the initializeWSN() 
funetion. After that, the weights initialization, required by the ANN, are established through the funetion 
initializeWeights(). This ean be done in a random way, thus the ANN starts from serateh or it ean 
be also done reading sueh weights from the eomputer, in this way the system starts with a pre-trained 
ANN. It was deeided to define the parameters randomly. 

The eore of the main loop first reeeives from eaeh sensor node a data frame (w), whieh eorresponds 
to the WAITEorDataErameO proeedure. It ineludes the temperature and some information to identify 
the node sender. Sensor nodes send the temperature eonstantly, and as they are equidistant plaeed in a 
medium size room, the differenee in temperature among them is minimum, thus it ean be eonsidered 
redundant for the present applieation. Although the most important for our study is to be able to 
proeess eontinuous messages eoming from different wireless nodes to perform an on-line learning 
as stated before. When a data frame is reeeived, its time stamp {t) is eolleeted through the funetion 
askEorCurrentTimeStampO and then it is ealled the subroutine processSampeeOnEine(p, f). 
Sueh a funetion reeeives a time stamp and a temperature value, and eomputes averages of quarters 
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(15 min) feeding these averages to the ANN. Such averages are computed as the integration between 
previous (time, temperature) pair and the current one. When a quarter is ready, it is given to the ANN as 
described later. 

This main loop has negligible memory requirements; it only uses static variables to pass the data 
between the different subroutines. The memory consumption due to the wireless communication protocol 
will be ignored in this paper, it is not the focus of the presented work. Nevertheless, depending on the data 
frames nature and the implementation of processForecastOutput(o) function, the whole algorithm 
can be used for different applications. In the present work, the goal is to predict the indoor temperature, 
and thus every data frame is a temperature value given by a wireless node. 

Algorithm 1 Main procedure for sink node. 

1: INITIALIZEWSNO 

2: INITIALIZEWEIGHTSO 

3: while true do 

4; V = waitForDataFrame() 

5: t = ASKF0RCURRENTTIMESTAMP() 

6; 0 = PROCESSSAMPLEONLlNE(r;, t) 

1: processForecastOutput(o) 

8 ; end while 


4.2. On-Line Sample Processing 

The subroutine PROCESSS AMPLEONLlNE(t>, t) is displayed at Algorithm 2. This subroutine receives 
a pair of temperature frame v and the timestamp t in seconds, and executes one iteration of the on-line 
algorithm. This algorithm is responsible for the computation of mean values of temperature every Q 
seconds. Once a mean value is available (which happens every Q seconds), the algorithm calls the 
procedure TRAlNANDFORECAST(r;q) which computes the first order differentiation of these means and 
adjusts the model weights. At the presented work, Q = 900 s (15 min, a quarter) because it is a reasonable 
value for temperatures. 

During temperature mean computation, due to the non deterministic nature of the WSN, the data 
frames maybe are not equidistant in time. This algorithm considers this issue implementing an procedure 
of aggregation for the mean temperature computation (For this aggregation, it is assumed that every 
node is sensing similar temperatures, so their readings can be mixed-up without problems.). Every pair 
of consecutive input data, which belongs to the same time quarter, are integrated and aggregated to an 
accumulator variable (see line 18 at Algorithm 2) following this equation: 





■ - (Vi -f Pi+l) 


(14) 


Q ' 2 


where {ti,Vi) and are the pair of input data. The Figure 4 shows a graphical illustration of 

this process for the case when the pair of data are into the same quarter. 
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Algorithm 2 PROCESSSAMPLEONLlNE(i;,t) 

Require: v, t are real numbers with data value and timestamp in seconds, v', t', Vq are static variables 
initialized with invalid values, and used by the algorithm to store the data value, the time and quarter 
number in previous function call, and in the end to aggregate the data value for current time quarter 
(the algorithm computes the mean every quarter.) The constant Q = 900 is the number of seconds in 
a quarter. The algorithm interpolates quarter mean values when they are missing, up to a maximum 
of M = 4 quarters. 

Ensure: A prediction vector, in case it is possible to be computed, otherwise it returns a NULL value. 


1 

2 

3 

4 
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if t' is not a valid value then 
{t mod Q) 


Vq-^ V ■ 


else 


Q 


H Initializes accumulated data for aggregation 


iiqt — q't> M then // Quarter change limit exceeded 
reset() 

else 

(m, h) = L(f', v', t, v) H Interpolates line Vi = m-ti + h using Equation (15) 
li = Q ■ {<i't + 1 ) 

while f do 

Vt^=ti-m + h 

Vq ^ Vq + A(f', v\ ti,Vti) H Aggregates data change following Equation (14) 

= TRAlNANDEORECAST(r>g) // East Oj. is stored to be returned at function end 
Vq ^ 0.0; U <(- L 

ti ti -\- Q 


end while 

then 

Vq Vq + A(f', v', t, v) H Aggregates last data change following Equation (14) 

end if 
end if 
end if 

t' t; w' <(— V] q[ ^ qt 

return Ot^ if available, otherwise null 


This aggregation has two boundary cases: when two consecutive pairs are in a different but 
consecutive time quarters (see Eigure 5); or when two consecutive pairs are in different and non 
consecutive quarters due to the lost of a large number of frames (see Eigure 6). In both situations, 
besides the case when a quarter is fully processed, are solved at the loop at line 10. Before the loop, 
it is interpolated a line equation which follows the temperature slope between the pair of available input 
data, computed with the next slope-intercept linear equation: 


A(fj, 'Ui, L+1, 'Pj-|_i) 


(m 


Vj+l - Vj 
L+i Vi 


b = Vi — m ■ ti) 


(15) 
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The loop begins at the start point, and traverses the line segment by Q seconds increments, computing 
the mean temperature of every possible quarter between both input pairs. When a quarter is completed, 
its mean value is given to the subroutine TRAlNANDFORECAST(ug). In the extreme case of losing a 
huge number of frames, the whole system is reset at line 6 of Algorithm 2, starting the process again 
but without initializing the model weights (This reset procedure also initializes the static variables of 
Algorithm 3.). 

The memory requirement for subroutine PROCESSSAMPLEONLlNE(r;, f) is again negligible. It uses 
only a few static variables to aggregate the data values and/or interpolate the lost quarter values. For 
temperature, data values are in °C. 



Time 


Figure 4. Illustration about the integration process of previous and current input data pairs. 



Time 

Figure 5. Illustration about the integration process of previous and current input data pairs 
when both are in different but in consecutive quarters. In green color has been shown the 
interpolation computed until 13:00, ending a quarter mean value computation. In gray color 
has been depicted the interpolation that would be aggregated to the quarter ending at 13:15. 
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Figure 6. Illustration about the interpolation and integration of previous and current input 
data pairs when they are in non consecutive quarters. In green color has been depicted the 
interpolation of the computed quarters that end at 13:15, 13:30 and 13:45. In gray color has 
been shown the interpolation aggregated to the following quarter that would end at 14:00. 

4.3. On-Line Training and Forecast using Back-Propagation 

The last subroutine is depicted at Algorithm 3. It computes the difference between two consecutive 
quarter means, and stores them into an auxiliary circular buffer B with length p + g. A counter k is used 
to control the number of items in the buffer, controlling if it is possible to produce a forecast, and/or 
when it is feasible to perform one training step of the model. In particular, the forecast condition is true 
when the buffer counter k is greater or equal to the model’s input size p. On the other hand, the train 
condition is true, when the buffer counter k is greater or equal to the sum of model’s input p and output 
q sizes. 

The algorithm uses a static variable v' where the value given at previous function call is stored. Thus, 
in the first call, the if statement at line 1 is not executed. In the following calls, the first order 
differentiation is computed at line 2 and the counter of elements is increased by one unit. The training 
condition is checked at line 4, and in case of success the input/output mapping is taken from the circular 
buffer B and the model is updated following BP Equations (B1)-(B8). The forecast condition is checked 
at line 13, and in case of success the input is taken from the last p items of the buffer B, then the forecast 
is produced following Equations (B1)-(B3). Einally, the output vector is dedifferentiated by computing 
the cumulative sum of the model output and adding it up with all the vector components and the input 
data value of current quarter. 

Note that the reset call at line 6 of Algorithm 2 also initializes the static variables of this one. This 
algorithm has more critical memory requirements, due to the circular buffer B. The length of such buffer 
is p + g, and in the experimentation these values are p = g = 8, therefore, this algorithm needs 16 real 
numbers, that using 32 bit real numbers, corresponds to 64 bytes. Thus the total memory consumption 
needed by the whole algorithm (BP + on-line control) in the worst case adds up to 64 + 736 = 800 bytes. 



12:00 12:15 12:30 12:45 13:00 13:15 13:30 13:45 14:00 
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Algorithm 3 TRAlNANDFORECAST(i;„) 


Require: Vq is a real number with the value at eurrent quarter, v'^ and fc = 0 are static variables, the 
first one stores a quarter value given in previous funetion eall and it is initialized with an invalid 
value. The seeond one is a counter initialized with 0 needed to access the auxiliary buffer B. p, 
q are constants with the input size, output size and buffer size respectively, i? is a static circular 
buffer with p + q length. For simplicity, B is indexed with any integer value i > 0, assuming that i 
mod (p + q) is needed to access to valid positions. B buffer stores the value difference between two 
time quarters. The forecast starts when the counter k is p, and the training starts when the counter 
k is at least p + q. The weight matrices and bj (both randomly initialized at start), activation 
vectors hj and error gradients bj are global variables, used by FORWARD, backprop and UPDATE 
functions, po is the initial learning rate, y is the learning rate decay value, and e is the weight decay, 
all of them are constants. These last three parameters have been set after a grid search optimization 
is done, with different values depending on the model, a = 0 is another static variable which is the 
number of performed learning iterations. 

Ensure: The algorithm trains the model using functions FORWARD, BACKPROP and UPDATE, and 
returns the forecast at current time quarter, or NULL if it cannot be computed. 
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if Ug is a valid value then 

B[k\ =Vq-Vg 

k k + l 

if k >= p + q then // Train when buffer B is full 
X = B[{k — p — q) : {k — q — 1)] 
y = B[{k-q) : {k - 1)] 

y = FORWARd(x) //Following Equations (B1)-(B3) 

BACKPROP(y, y) // Following Equations (B4)-(B6) 

= Bo 

^ (1 + a-po)^ 

UPDATE (p, e) // Eollowing Equations (B7)-(B8) 

OC i — (X -f 1 

end if 

if k >= p then // Compute forecast 
X = B[{k — p) : {k — 1)] 

y = FORWARd(x) //Eollowing Equations (B1)-(B3) 

o = CUMSUM(y) + Vq H The outputs vector y is dedifferentiated by computing the cumulative 
sum of the vector and adding-up current quarter value to all vector components 

end if 
end if 

V'^ ^ Vq 

return o if available, otherwise null 


In this algorithm the training and forecasting procedures are delayed by q iterations (q time quarters). 
Thus the algorithm produces forecasts using a model trained with data of q quarters in the past. 
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4.4. Last Remarks 

As has been stated before, following an on-line method, this algorithm trains an ANN model for time 
series forecasting. In order to improve the model performance, the input data is aggregated and filtered 
by computing its mean every Q seconds. After that, first order differentiation of the data is computed, 
and the model is trained to predict the differentiated series. Figure 7 shows an illustration of input data, 
its transformation into means every Q seconds, and the differentiated series used to train the system. 
Mean and differentiated series have been plotted in the mid-point of every Q window. 



Time 

Figure 7. Illustration of input data, means and first order differentiated result. 

This algorithm has been proposed to receive as input a fixed number of delayed past values. This limits 
the model ability to learn and forecast variables which are conditioned to exogenous data (For instance, 
in case of indoor air temperature prediction, HVAC operations and other related variables should be 
taken into account to ensure good model performance.). Nevertheless, it is straightforward to extend the 
algorithm to receive additional data as input, which would be passed directly to the model as covariates. 

5. Results and Discussion 

For the study of the proposed algorithm, two case studies have been completed. The first one is a 
simulation using artificially generated data, with a very large number of iterations in order to have a 
baseline to compare model behavior and accuracy estimate. The second one is a real application for 
indoor temperature forecasting, using the dataset SML2010 DataSet [22] at UCI Machine Learning 
repository [23]. The SML2010 DataSet has been created by the ESAI research team (the authors of the 
present paper) at the Universidad CEU Cardenal Herrera, monitoring the data captured from a solar 
domotic house that participated in the Solar Decathlon Europe 2010, a world competition of energy 
efficiency. Such a dataset has been employed as it is tidy, i.e., it has been cleaned and structured, and it 
is ready for analysis. 
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5.1. A Simulation Study 

A simulation study has been carried out to deeply explore the algorithm behavior. The simulated 
dataset contains 10® data, based on a sinus function in different time points along 8500 h, with centigrade 
degree values between 10 and 30 (more variability than real temperature values). Furthermore, the 
distance between two consecutive values was randomly taken from a range of [20,40] s. Sin values 
have also been randomly modified with noise in a range of [—1.5,1.5]. The original dataset was on-line 
preprocessed by the mean in order to obtain one value of temperature every 15 minutes. Moreover, in 
order to increase model generalization, first differences on preprocessed data were calculated, obtaining 
the final dataset that was modeled. Note that the proposed on-line BP algorithm computes the mean and 
first order differences on-the-fly during the training process. This dataset is shown in Figure 8. 



Time (minutes) 



Figure 8. First 5000 simulated time series data and first differences. 

Figure 9 and 10 show the MAE and MAE* behavior. Eirst, the Eigure 9 illustrates how the errors 
evolve with the number of steps-ahead. The first 15,000 observations have not been considered in this 
calculation, because they belong to the period in which the convergence of the algorithms had not been 
reached. Second, the Eigure 10 shows the smoothed MAE* behavior, calculated by 10 values window 
length to avoid randomness noise over the time and the Table 1 depicts the MAE summarized obtained 
from the dataset for the baseline Bayesian model and the implementable ANN models. The input/output 
structure was defined choosing 8 values as input and 8 future values as output. Thus, the model receives 
at each moment as input, the last eight values in time series and it must predict next eight values (next 
two hours by steps of 15 min). 
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Figure 9. MAE of each step-ahead for last 15,000 for sin data. 


Simulated data 



Figure 10. Smoothed MAE*—sin data. 


Eigure 10 and Table 1 demonstrate the comparison between the Bayesian baseline and the 
performance obtained by the linear and MEP models. It is possible to observe how the Bayesian baseline 
is able to obtain low errors almost from the first iteration, because of its better utilization of the available 
information. The structure employed in this baseline model is higher than the other two methods. The 
baseline model operates in each iteration with a n x p matrix X, with n = p + 1, and a vector n x 1 
vector Y.j. However, ANN models handles only a 1 x p vector x and a vector 1x1 vector y. Regarding 
ANN models it is observed that the linear model learns faster than MEP, nevertheless MEP achieves 
better error in the long-term. As expected, the more complex the model is the more difficult the learning 
process, but it achieves similar results to the Bayesian baseline in the long run. 

Table 1. Comparison between Bayesian baseline and ANNs models. Both methods has a 
p = 8 inputs and q = 8 outputs. The MEP has a hidden layer with h = 8 neurons. 


Method 

Min, 

Ql 

Q2 

Mean 

Q3 

Max. 

Baseline (Bayesian standard) 

0.047 

0.248 

0.442 

0.528 

0.720 

4.227 

Lin 

0.036 

0.368 

0.632 

0.648 

0.877 

2.991 

MLP 

0.046 

0.323 

0.553 

0.662 

0.871 

3.708 
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5.2. Real Application: Temperature Forecasting in a Solar House 

As was mentioned before, the Sehool of Technical Sciences at the University CEU Cardenal 
Herrera (CEU-UCH) participated in 2010 and 2012 at Solar Decathlon Europe competition building 
two solar-powered houses known as SMEhouse and SMESystem respectively (Eigure 11). One of the 
technologies integrated in that houses was a monitoring system developed to collect all the data related 
with energy consumption and other variables as indoor/outdoor temperature, C02, etc. A tidy database 
was created and it has been exploited to develop different ANN models for prediction purposes in 
previous research projects. 



Figure 11. Solar-powered houses: SMEhouse (left) and SMEsystem (right). 

As stated before, a tidy dataset, with 673 h (28 days) of real temperature data, recorded every 15 min 
is available (2688 equally spaced temperature data) and it has been utilized to study the performance of 
the overall system. This dataset and its first difference is shown in Eigure 12. 
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Figure 12. SMEhouse time series data. 
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In the same way as the study of the simulated data, Figures 13 and 14 show de MAE and MAE* 
behavior. Eigure 13 illustrates how the errors evolve with the number of steps-ahead and Eigure 14 
visualizes the smoothed MAE* behavior, ealeulated by 10 values window length to avoid randomness 
noise over the time. Table 2 shows the summarized MAE obtained with the dataset for the baseline 
Bayesian model and the implementable ANN models. The input/output strueture was defined ehoosing 
8 values as input and 8 future values as output. Eurthermore, the model reeeives at eaeh moment as 
input the last eight values in time series and must prediet the next eight ones (next two hours by step 
of 15 min), as it has been deseribed for the baseline before. Beeause of the short observation period, 
eompared to the observation period of simulated data, ANNs models do not eonverge to the baseline 
model results. However, as it has been demonstrated in the simulation study, it is expeeted that as the 
time evolves, the errors tend to be equal. Anyway, although it ean be observed how the Bayesian baseline 
is able to obtain low errors again, nevertheless the absolute differenees between this method and ANN 
models is negligible in praetieal terms. 
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Figure 13. MAE of eaeh step-ahead for SME house foreeasts. 
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Figure 14. Smoothed MAE*—SME house. 



The experimental results obtained seem very promising, as we were able to implement a eomplex 
algorithm in a simple hardware deviee to prediet time series aeeurately. In previous researeh, published in 
the journal Energy and Buildings [24] this kind of algorithms, in analogous situations but with additional 
variables, were able to obtain superior predietions and eonverge to low errors in lower periods of time. 
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Such issue makes us to eonsider dealing with the possibility of using the present algorithm and the 
same ideas to develop a new one that will employ similar variables, as the previous work, but to be 
implemented in hardware deviees with higher resourees, keeping the restrietions of using eheap and 
small-sized mieroeontrollers. 

Table 2. Comparison between Bayesian baseline and ANNs models. Both methods has a 
p = 8 inputs and q = 8 outputs. 


Method 

Min. 

Ql 

Q2 

Mean 

Q3 

Max. 

Baseline (Bayesian standard) 

0.008 

0.087 

0.141 

0.184 

0.222 

4.295 

Lin ANN 

0.011 

0.213 

0.309 

0.373 

0.432 

1.978 

MLP ANN 

0.013 

0.233 

0.466 

0.527 

0.734 

2.109 


6. Conclusions 

This paper deseribes how to implement an artifieial neural network in a low eost system-on-ehip to 
develop an autonomous intelligent wireless sensor network. The model is able to produee predietions of 
temperature eoming from four sensor nodes based on an on-line learning approaeh. The idea behind this 
projeet is to evaluate if it is possible to develop, in a system with very few resourees as the MCU 8051, 
the neeessary souree eode to integrate a neural network that learns a time series in an on-line strategy, 
i.e., without any historieal database. It is obvious that all kind of problems eannot be afforded with this 
teehnology/approaeh. Nevertheless, on some physieal measurements, represented as a stationary time 
series, it is possible to apply an on-line learning paradigm. And it makes attainable to generate predietions 
of time series in feasible temporal windows, in the present ease study in few days. That means that it is 
eoneeivable to plaee a low eost and small intelligent system in a totally unknown environment to learn 
its dynamies very rapid and with a wireless teehnology. 

The on-line learning approaeh is suboptimal in terms of the model aeeuraey. The proposed algorithm 
is able to learn aeeurate foreeast models, but as stated at Seetion 3.2, learning in a sequential way 
eould harm the model learning. Stoehastie gradient deseent algorithms are based on random sampling 
of training dataset, and minor ehanges in the proposed algorithm ean be performed to allow some elass 
of random sampling in the real-time data stream. The implementation of these ideas in more eomplex 
deviees would allow to inerease model and of eourse algorithm eomplexity, in time and spaee, improving 
the experimental results shown in this work. An idea for the future is to seale this projeet to arehiteetures 
more eomplex but starting with an effieient algorithm as a baseline. 

Finally, some issues would arise when using the proposed models to prediet the temperature of a 
room where HVAC system is operating. Air temperature would be affeeted by HVAC operations, among 
other exogenous variables, thus foreeasting only with a fixed number of past delayed values would be not 
enough. However, this problem ean be taekled by extending the model with additional input information 
regarding the operations performed by the HVAC system and other exogenous variables that determine 
the room temperature. 
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A. Standard Bayesian Linear Model Estimation 

This section describes some aspects of the standard bayesian estimation for regression linear model 
parameters in which our work is based. 

A.l. Non-Informative Prior Distributions 
The linear model could be written: 


y|w,e^X ~ iV(Xw,e2l) (Al) 

The non-informative prior distribution most commonly used for linear regression is: 

P(w|e^) oc \ (A2) 

The posterior distribution of w given and the marginal posterior distribution of could be obtained 
analytically as follows: 


w|e^,F ~ N ormal{w (A3) 

e^|F ~ Inver sey^ {n — (A4) 

The parameters estimation of these distributions could be calculated as follows: 

w = (XTX)-^XTy (A5) 

Kv = (XTX)-' (A6) 

^2 _ (y-Xw)T(|/-Xw) 

n — p 


The symbol j denotes matrix transposition. It would be needed n > p for carry out parameters 
estimation and n = p + 1 has been considered in the process estimation (p x (p -f 1) data). 
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If q must be predicted in each time step, q linear models must be estimated. Thus, q estimations of 
vector w (wj for i = 1, ...q) could be represented as W matrix, that contains in each of its q columns the 
estimated parameter vector Wj for each prediction. 

W.i = Wi 

In the same way, vector of predictions y becomes in a matrix Y with q columns. Vector column W * 
corresponds with parameter vector in the linear model with response vector Y j. Each of these models 
has the same predictor matrix X. 

At time f = p + p+ la first estimation W[0] is available and can be used to make predictions at time 
f + 1. The predictive distribution, Y, given a new set of predictors Xp has mean: 

Y.j = XpW[0].j i = 1, ...,g (A8) 

With variance for this estimation: 

var{Y,,\el Y) = (/ + X,V^X;)e^, (A9) 

It is necessary to solve matrix products and inverse matrices with dimension (p x p). 


A. 2. Informative Prior Distribution 


Furthermore, in the following time points t, if last parameter estimation is W[f — 1], s‘^[t — 1] and 
Vw[f — 1] and we have new data X[t] and Y[t], new parameters estimation at this point of time can 
be computed by treating the prior as additional data points, and then weighting their contribution to the 
new estimation [20]. To perform the computations, for each prediction value Y[t],i (i = 1,..., g) it is 
necessary to construct a new vector of observations Y* with new data and last parameter estimations, and 
predictor matrix X*, and weight matrix S based on previous variance parameters estimation as follows: 


’Y'* 

.i 

X* 

s 


nth 

W[f-1]., _ 

X[t] 1 

Ip 

In 0 

0 Ywit-init-i] 


New parameters estimation at time t could be written as: 


W[t],i = (X*'X-^X*nX*'X-^Y*i i = l,...,q 

Yw[t] = ix*"x-^x*n 
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moreover, Sq is the varianee estimation at time t — 1 and uq is its degrees of freedom, and rii is the degrees 
of freedom in the new data varianee estimation. Computational eost is higher than first step, with more 
matrieial produets and more inverse matriees ealeulus. The Bayesian standard parameters estimation is 
a simple proeess but with high resourees requirements. 

B. On-Line Back-Propagation Derivation for ANN Models 

This seetion mathematieally formalizes the BP algorithm, whieh is widely known in ANN related 
literature. These equations are deseribed for eompleteness and to help the understanding of memory 
requirements stated at Seetion 4. 

For any ANN model, with zero or more hidden layers, the proeedure of eomputing its output y given 
its inputs X is known as forward step. The following equations show the eomputation needed during 
forward step: 


ho 

= X 

(Bl) 

hj 

= s{W j ■ hj_i + bj) , for 1 < j < iV 

(B2) 

y 

= ■ liTV-i + bjv 

(B3) 


being 5 ( 2 ;) = the logistie funetion and N the number of layers in the network, that is, the 

number of hidden layers plus one beeause of the output layer. With forward step, all hidden and 
output layer y aetivations are eomputed. Following the forward step, it is possible to eompute the loss 
L(y, y) of the ANN output respeet to the given desired output y by using the mean square error. The 
derivation of this loss funetion respeet to every output and hidden layer is eomputed by the backprop 
step by means of the next equations: 

N 

j=l oiSWj 

= y-y (B5) 

= h'. o (WTj+i ■ 6j+i) , for 1 < j < iV (B6) 

being A o B the eomponent-wise produet between two veetors and h' the derivative of the logistie 
funetion, that eorresponds to h' o — h' ). After the forward and baekprop steps, all aetivations and 
hidden and output layer error gradients bj are available. Thus, the gradient of the loss funetion respeet 
to the weight matriees and bias veetors ean be eomputed. Following this gradient, all weights and biases 
are updated. Sueh phase is denoted by update step, and its equations are: 


’\Y(e+l) 
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J 3 

(^6,- ® h,_i + eWf) 

, for 1 < j < AT (B7) 
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being A 0 B the outer produet between two veetors. 
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